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WO 02/066640 PCT/EP02/01862 

ZINC FINGER BINDING DOMAINS FOR NUCLEOXmE SEQUENCE ANN 

Technical Field of the Invention 

The field of this invention is zinc finger protein binding to target nucleotides. More 
particularly, the present invention pertains to amino acid residue sequences within the a- 
helical domain of zinc fingers that specifically bind to target nucleotides of the fonnula 5'- 
(ANN)-3\ 

Backpround of the Invention 

The construction of artificial transcription factors has been of great interest in the 
past years. Gene expression can be specifically regulated by polydactyl zinc finger proteins 
fused to regulatory domains 

Zinc finger domains of the Cys2-His2 family have been most promising for the 
construction of artificial transcription factors due to their modular structure. Each domain 
consists of approximately 30 amino acids and folds into a Pfk structure stabilized by 
hydrophobic interactions and chelation of a zinc ion by the conserved Cys2-His2 residues. 
To date, the best characterized protein of this family of zinc finger proteins is the mouse 
transcription factor Zif 268 [Pavletich et al., (1991) Science 252(5007), 809-817; Elrod- 
Ericksonetal., (1996)5/n/c/ure4(10), 1171-1180]. The analysis of the Zif 268/DNA 
complex suggested that DNA binding is predominantly achieved by the interaction of 
amino acid residues of the a-helix in position -1, 3, and 6 with the 3', middle, and 5' 
nucleotide of a 3 bp DNA subsite, respectively . Positions 1, 2 and 5 have been shown to 
make direct or water-mediated contacts with the phosphate backbone of the DNA. Leucine 
is usually found in position 4 and packs into the hydrophobic core of the domain. Position 
2 of the a-helix has been shown to interact with other helix residues and, in addition, can 
make contact to a nucleotide outside the 3 bp subsite [Pavletich et al, (1991) Science 
252(5007), 809-817; Elrod-Ericksonetal., (1996) 4(10), 1171-1180; Isalan,M. 

eidl.,(l997)Proc Natl Acad Sci USA 94(11), 5617-5621]. 

The selection of modular zinc finger domains recognizing each of the 5'-GNN-3' 
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DNA subsites with high specificity and affinity and their refinement by site-directed 
mutagenesis has been demonstrated. These modular domains can be assembled into zinc 
finger proteins recognizing extended 18 bp DNA sequences which are unique withm the 
human or any other g^ome. In addition, these proteins function as transcription factors 
and are capable of altering gene expression when fused to regulatory domains and can even 
be made hormone-dependent by fusion to ligand*bmding domains of nuclear hormone 
receptors* To allow the r^id construction of zinc finger-based transcription factors 
binding to any DNA sequence it is important to extend the existing set of modular zinc 
finger domains to recognize each of the 64 possible DNA triplets. This aim can be 
achieved by phage display selection and/or rational design. 

Due to the limited structural data on zinc finger/DNA interaction rational design of 
zinc proteins is very time consuming and may not be possible in many instances. In 
addition, most naturally occurring zinc finger proteins consist of domains recognizing the 
S'-GNN-3' type of DNA sequences. Only a few zinc finger domains binding to sequences 
of the 5'-ANN-3* type are found in naturally occurring proteins, like finger 5 (5*-AAA-3*) 
of Gfi-1 [Zweidler-McKay et al., (1996) Mo/. Cell Biol 16(8), 4024-4034], finger 3 (S'- 
AAT-3*) of YYl [Hyde-DeRuyscher, et al., (1995) Nudeic Acids Res. 23(21), 4457-4465], 
fingers 4 and 6 (5*-[A/G]TA-3*) of CF2n [Gogos et al, (1996) PNAS 93, 2159-2164] and 
finger 2 (5'-AAG-3*) of TTK [Fairall et al., (1993) Nature (London) 366(6454), 483-7]. 
However, in structural analysis of protein/DNA complexes by X-ray or NMR studies, 
interaction of the amino acid residue in position 6 of the a-helix with a nucleotide other 
than 5' guanine was never observed. Therefore, the most promising approach to identify 
novel zinc finger domains binding to DNA target sequences of the type 5'-ANN-3', 5*- 
CNN-3' or 5'-TNN-3' is selection via phage display. The limiting step for this approach is 
the construction of libraries that allow the specification of a 5' adenine, cytosine or 
thymine. Phage display selections have been based on ZiG68 in which in which dififerent 
fingers of this protein where randomized [Choo et al., (1994) Proc, Natl Acad. Sci. U. S, 
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A, 91(23), 1 1 168-72; Rebar et al., (1994) Science (Washington, D, C, 1883-) 263(5147), 
671-3; Jamieson et al., (1994) Biochemistry 33, 5689-5695; Wu et al, (1995) PNAS 92, 
344-348; Jamieson et al, (1996) Proc Natl Acad Sci USA 93, 12834-12839; Greisman et 
al., (1997) Science 275(5300), 657-661]. A set of 16 domains recognizing the 5'-GNN-3* 
type of DNA sequences has previously been reported from a library where finger 2 of C7, a 
derivative of Zi£268 [United States Patent No. 6,140,081, the disclosure of which is 
incorporated herein by reference; Wu et al, (1995) PNAS 92, 344-348 Wu, 1995 #164], 
wasrandomized [Segal etal., (1999) Proc A/ii//i4carfiSc/ USA 96(6), 2758-2763]. In such 
a strategy, selection is limited to domains recognizing S'-GNN-3' or 5'-TNN-3* due to the 
Asp^ of finger 3 making contact with the complementary base of a 5' guanine or thymine in 
the finger-2 subsite [Pavletich et al., (1991) Science 252(5007), 809-817; Ebod-Erickson et 
al., (1996) Structure 4(10), 1 171-1 180]. The lunited modularity of zinc finger domains, 
which may in some cases recognize a nucleotide outside the 3 bp subsite, has been 
discussed intensively [Wolfe et al., (1999) Annu. Rev. Biophys. Biomol Struct 3, 183-212; 
Segal et al., (2000) Curr Opin Chem Biol 4(1), 34-39; Pabo et al., (2000) X Mol Biol 301 , 
597-624; Choo et al., (2000) Curr. opin. Struct Biol 10, 41 1-416]. One s^proacfa to 
ov^come the limitations imposed by target site overiap is the randomization of amino acid 
residues in two adjacent fingers [Jamieson et al., (1996) Proc Natl Acad Sci USA 93, 
12834-12839; Isalan et al., (1998) Biochemistry 37(35), 12026-12033]. A second, but time 
consuming ^proach is the sequential selection of fingos 1 to 3 for a specific 9 bp target 
site which accounts for the individual structure and mode of DNA binding of each finger 
and its surrounding fingers [Greisman et al., (1997) Science 275(5300), 657-661; Wolfe et 
a., (1999) J Mol Biol liSiS), 1917-1934]. 

The pres^ q)proach is based on the modularity of zinc finger domams that allows 
the rapid construction of zinc finger proteins by the sciratific coimnunity and demonstrates 
that the concerns regarding lunitation iDqx>sed by cross-subsite interactions only occurs in 
a limited number of cases. The present disclosure introduces a new strategy for selection 
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of zinc finger domains specifically recognizing the S'-ANN-3' type of DNA sequences. 
Specific DNA-binding properties of these domains was evaluated by a multi-target EUSA 
against all sixteen 5*-ANN-3' triplets. These domains can be readily incorporated into 
polydactyl proteins containing various numbers of S'-ANN-3' domains, each specifically 
recognizing extended 18 bp sequences. Furthemiore, these domains were able to 
specifically alter gene expression when fiised to regulatory domains. These results 
underiine the feasibility of constructing polydactyl proteins fiom pre-defined building 
blocks. In addition, the domains characterized here greatly increase the number of DNA 
sequences that can be targeted with artificial transcription factors. 

Brief Summarv of the Invention 

The present disclosure teaches the construction of a novel phage display library 
enabling the selection of zinc fingo* domains recognizing the S'-ANN-3' type of DNA 
sequences. Such domains were isolated and showed exquisite binding specificity for the 3 
bp target site for against which they were selected. These zinc finger domains were 
engrafted into 6-finger proteins which bound specifically to their 18 bp target site with 
affinities in the pM to lower nM range. When fiised to regulatory domains, one artificial 6- 
finger protein containing five S'-ANN-3' and one S'-TNN-3' domain regulated a luciferase 
reporter gene under control of a minimal promoter containing the zinc finger-binding site 
and a TATA-box. hi addition, 6-finger proteins assembled from S'-ANN-3* and S'-GNN- 
3' domains diowed specific transcriptional regulation of the endogenous erbB''2 and erbB- 
3 geneSf respectively. These results show that modular zinc finger domains can be selected 
binding to 3 bp targ^ sites other than 5 *-GNN-3 * and that they are suitable as additional 
modules to a:eate artificial transcription fectors, thereby greatly increasing the number of 
sequences that can be targeted by DNA-binding proteins built fix>m pre-defined zinc finger 
domains. 

Thus, the present mvention provides an isolated and purified polypeptide that 

4 



wo 02/066640 



PCT/EP02/01862 



contains from 2 to 12 zinc finger-nucleotide binding pq)tides, at least one of which 
peptides contains a nucleotide binding region having the sequence of any of SEQ ID NO: 
7-71 and 107-1 12. In a preferred embodiment, the polypeptide contains from 2 to 6 zinc 
finger-nucleotide binding peptides. Such a polypeptide binds to a nucleotide that contains 
the sequence 5'-(ANN)n-3\ wherein each N is A, C, G, or T and y/here n is 2 to 12. 
Preferably, each of the peptides binds to a different target nucleotide sequence. A 
polypeptide of this invention can be operatively linked to one or more transcription 
regulating factors such as a repressor or an activator. 

Polynucleotides that encode the polypeptides, expression vectors containing the 
polynucleotides and cells transformed with expression vectors are also provided. 

bi a related aspect, the present invention provides a process of regulating expression 
of a nucleotide sequence that contains the sequence (S -ANN) „-3', where n is an integer 
from 2 to 12. The process includes the step of exposing the nucleotide sequence to an 
effective amount of a polypq>tide of this invention under conditions in which the 
polypeptide binds to expression regulating sequences of the nucleotide. Thus, the sequence 
5'-(ANN)n-3' can be located in the transcribed region of Ihe nucleotide sequence, a 
promoter region ofthe nucleotide sequence or withm an expressed sequence tag. A 
polypeptide is preferably operatively linked to one or more transcription regulating factors. 

Brief Description of the Drawings 

Fig.1 shows, schematically, construction of the zmc Gnger phage display library. 
Solid arrows show interactions of the amino acid residues of the zinc fing^ helices with 
the nucleotides of their binding site as detmnined by x-ray oystallography of Zif 268 and 
dotted lines show proposed interactions. 

Fig.2 shows amino acid sequences of finger-2 recognition helices from selected 
clones. For each DNA target site several single clones were sequenced after the sixth 
round of panning and the amino acid determined to evaluate the selection. The DNA 
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recognition subsite of finger 2 is shown on the left of each set, followed by the number of 
each occurrence. The position of the amino acid residue within the a-helix is shown at the 
top. Boxed sequences were studied in detail and represent the best binders of each set. 
Sequences marked with an asterisk were additional analyzed clones. ^Clones with a Ser"* to 
Cys^ mutation in finger 3. ^Sequences detemiined after subcloning the zinc finger 
sequences fiom the DNA pool aft^ the sixth round of selection into a modified pMAL-c2 
vector. * Additional clones analyzed. 

Fig. 3 (shown in 26 panels: 3a-3z) shows multitarget specificity assay to study 
DNA-binding properties of selected domains. At the top of each graph is the ammo acid 
sequence of the finger-2 domain (positions -2 to 6 with respect to the helix start) of the 3- 
finger protein analyzed. Black bars represent binding to target oligonucleotides with 
difiFerent finger-2 subsites: AAA, AAC, AAG, AAT. ACA, ACC, ACG, ACT, AGA, AGC, 
AGT, ATA, ATC, ATG, and ATT. White bars represent binding to a set of 
oligonucleotides where the finger-2 subsite only diffi^ in the 5' position, for example for 
the domain binding Ac S*-AAA-3' subsite (Fig. 3a) AAA, CAA, GAA, or TAA to evaluate 
the S' recognition. The height of each bar represents the relative affinity of the protein for 
each target, averaged over two independent experiments and nonnalized to the highest 
signal among the black or white bars. Error bars represent the deviation torn the average. 
Proteins analyzed correspond to the boxed helix sequences bom Fig. 2. *: Proteins 
containing a finger-2 domain which was generated by site-directed mutagenesis. 

Fig. 4 (shown in 2 panels: A and B) shows the construction of six-finger proteins 
containing domains recognizing S'-ANN-3* DNA sequences and ELISA analysis. A: The 
six-fingCT proteins pAait, pE2X, pE3Y and pE3Z were constructed using the SplC 
fi:amewoik. Amino acid residues in position -1 to 6 of the a-recognition helix are given for 
each finger that was utilized. B: Proteins were expressed in E. coli as MBP fiision proteins. 
Specificity of binding was analyzed by measurement of the binding activity fix>m crude 
lysates to immobilized biotinjdated oligonucleotides (E2X, 5*-ACC GGA GAA ACC AGG 
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GGA-3' (SEQ ID NO: 72); E3Y, 5'-ATC GAG GCA AGA GCC ACC-3' (SEQ ID NO: 
73); E3Z, 5'-GCC GCA GCA GCC ACC AAT-3' (SEQ ID NO: 74); Aait, 5'-ATG-TAG- 
AGA-AAA-ACC-AGG-3' (SEQ ID NO: 75)). Assays were performed in duplicates, bars 
repres^ting the standard deviation. Black bars: pE2X; striped bars: pE3Y; Gray bars: 
pE3Y; white bars: pAart 

Fig.S (shown in 2 panels: A and B) shows luciferase reporter assay results. HeLa 
cells were cotransfected with the indicated zinc finger expression plasmid (pcDNA as 
control) and a reporter plasmid containing a luciferase gene under the control of a minimal 
promote with TATA-box and zinc finger- binding sites (A: S x Aart binding site; B: 6 x 
2C7 binding sites). Luciferase activity in cell extracts was measured 48h after transfection. 
Each bar rq)resent$ the mean value (+A standard deviation) of duplicate measurements. 
Y-axis: light units divided by 10^. X-axis: constructs coding for zinc finger proteins 
transfected; control, rq>orter alone. 

Fig.6 (shown in 2 panels: A and B) shows retrovinis-mediated gene targeting. 
A43 1 cells were infected with retrovirus encoding for pE2X (A) or pE3 Y (B) fiised to 
either ttie activation domain VP64 or rq)ression domain KRAB, respectively. Three days 
later, intact cells were stained with the ErbB*l -specific mAb EGFR-1, the EibB-2-specific 
mAb FSP77, or the ErbB-3 specific mAb SGPl in combination with phycoerythrin-labeled 
secondary antibody. Dotted lines: control staining (primary antibody omitted); dashed 
lines: specific staining of mock-infected cells; dotted/dashed lines: cells expressing zinc 
finger protein-VP64 fiisions; solid lines: cells expressing zinc finger protein-KRAB 
fiisions. 

Detailed Description of flie Invention 
I Zmc finger Polvpeptides 

The present invention provides isolated and purified polypq)tides that contain fiom 
2 to 12 nucleotide binding domain pq>tides derived fifom zinc fing^ proteins. The 
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nucleotide bindiDg domain peptides are derived from the a-helical poition of the zinc finger 
proteins. Preferred such nucleotide binding domain peptides have the amino acid residue 
sequence of any of SEQ ID NOs: 7-71 or 107-1 12. Preferably, the peptide has the amino 
acid residue sequence of any of SEQ ID NOs: 46-70. More preferably, the peptide has the 
amino acid residue sequence ofanyofSEQ ID NOs: 10, 11, 17, 19,21,23-30,32, 34-36, 
42, 43 or 4S. Each of the peptides is designed and made to specifically bind nucleotide 
target sequences corresponding to tiie formula S'-ANN-3*, where N is any nucleotide (i.e., 
A, C, G or T). Thus, a polypeptide of this invention binds to a nucleotide sequence S*- 
(ANN)h-3*,whereni$anintegerfix>m2to 12. Preferably, n is fix)m 2 to 6. 

A compound of this invention is an isolated zinc finger-nucleotide bmding 
polypq>tide that binds to a ANN nucleotide sequence and modulates the function of that 
nucleotide sequence. The polypeptide can enhance or suppress transcription of a gene, and 
can bind to DNA or RNA. A zinc finger-nucleotide binding polypeptide refers to a 
polypeptide which is a mutagenized form of a zinc finger protein or one produced through 
recombination. A polypeptide may be a hybrid which contains zinc finger domain(s) from 
one protein linked to zinc finger domain(s) of a second protein, for example. The domains 
may be wild type or mutagenized. A polypeptide includes a truncated fomi of a wild type 
zinc finger protein. Examples of zinc finger proteins from which a polypeptide can be 
produced include TFHIA and zifZ68. 

A zinc finger-nucleotide binding polypeptide of this invention comprises a unique 
heptamer (contiguous sequence of 7 amino acid residues) within the a-helical domain of 
the polypeptide, which heptameric sequence detennines bindmg specificity to a target 
nucleotide. That heptameric sequence can be located anywhere within the a-helical domain 
but it is preferred that the heptamer extend from position -1 to position 6 as the residues are 
conventionally numbered in the art A polypeptide of this invention can include any ^ 
sheet and framework sequences known in the art to fimction as part of a zinc finger protein. 
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A large number of zinc finger-nucleotide binding polypeptides were made and tested for 
binding specificity against target nucleotides containing a ANN triplet. 

The zinc finger-nucleotide binding polypeptide dmvative can be derived or 
produced fiom a wild type zinc finger protein by truncation or expansion, or as a variant of 
the wild type-derived polypeptide by a process of site directed mutagenesis, or by a 
combination of the procedures. The term 'Iruncated'' refers to a zinc finger-nucleotide 
binding polypeptide tiiat contains less that tiie fiiU numb^ of zinc fingers found in the 
native zinc finger binding protein or that has been deleted of non-desired sequences. For 
example, tnmcation of the zinc finger-nucleotide binding protein TFIUA, which naturally 
contains nine zinc fingers, might be a polypq)tide with only zinc fingers one through three. 
Expansion refers to a zinc finger polypeptide to which additional zinc fing^ modules have 
been added. For example, TFHIA may be extended to 12 Sngers by adding 3 zinc finger 
domains, b addition, a truncated zinc finger-nucleotide binding polypeptide may include 
zinc finger modules bom more than one wild type polypeptide, thus resulting in a 'liybrid'* 
zinc finger-nucleotide binding polypeptide. 

The tarn '"mutagenized** refers to a zinc finger derived-nucleotide binding 
polypeptide that has been obtained by perfonning any of tiie known methods for 
accomplishing random or site-directed mutag^esis of the DNA encoding the protein. For 
instance, in TFIIIA, mutagenesis can be performed to rq)lace nonconserved residues in one 
or more of die rq)eats of the consensus sequence. Truncated zinc finger-nucleotide binding 
proteins can also be mutagenized. 

Examples of known zinc finger-nucleotide binding polypeptides that can be 
truncated, expanded, and/or mutagenized according to the present invention in order to 
mhibit the fimction of a nucleotide sequence containing a zinc finger-nucleotide binding 
motifmcludes TFIIIA and zi£268. Other zinc finger-nucleotide binding proteins will be 
known to those of skill in the art 
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A polypeptide of this invention can be made using a variety of standard techniques 
well known in the art. Phage display libraries of zinc finger proteins were created and 
selected under conditions that favored enrichment of sequence specific proteins. Zinc 
finger domains recognizing a number of sequences required refinement by site-dir^ted 
mutagenesis that was guided by both phage selection data and structural information. 
Previously we reported the characterization of 16 zinc fingo- domains specifically 
recognizing each of the S'-GNN-3' type of DNA sequences, that were isolated by phage 
display selections based on C7, a variant of the mouse transcription factor Zif268 and 
refined by site-directed mutagenesis [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 
2758-2763; Dreier et al., (2000) J. Mol Biol 303, 489-502]. The molecular interaction of 
Zi£268 with its target DNA 5'-GCG TGG GCG-3' (SEQ ID NO: 76) has been 
characterized in great detail. In general, the specific DNA recognition of zinc finger 
domains of the Cys2-His2 type is mediated by the amino acid residues -1, 3, and 6 of each 
a-helix, although not in every case are all three residues contacting a DNA base. One 
dominant cross-subsite interaction has hem observed fiY>m position 2 of the recognition 
helix. Asp^ has been shown to stabilize the binding of zinc finger domains by directly 
contacting the complementary adenine or cytosine of the 5' thymine or guanine, 
respectively, of the following 3 bp subsite. These non-modular interactions have been 
described as target site overlap. In addition, other interactions of amino acids with 
nucleotides outside the 3 bp subsites creating extended binding sites have been reported 
[Pavletich et al., (1991) Science 252(5007), 809-817; Ehod-Erickson et al., (1996) 
Structure 4(10), 1 171-1 180; Isalan et al„ (1997) Proc Natl Acad Sci USA 94(11), 5617- 
5621]. 

Selection of the previously reported phage display library for zmc Snget domains 
binding to 5' nucleotides other than guanine or thymine met with no success, due to the 
cross-subsite interaction firom a^artate in position 2 of the finger-3 recognition helix RSD- 
E-LKR. To extend the availability of zinc finger domains for the construction of artificial 
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transcription factors, domains specifically recognizing the S'-ANN-3' type of DNA 
sequences were selected. Other groups have described a sequential selection method which 
led to the characterization of domains recognizing four S*-ANN-3' subsites» S*-AAA-3\ 
5'-AAG-3', 5'-ACA3'. and 5*-ATA-3' [Greisman et al., (1997) Sci&ice 275(5300). 657- 
661; Wolfe et al., (1999) JMolBiol 285(5), 1917-1934]. The present disclosure uses a 
different approach to select zinc finger domains recognizing such sites by eliminating the 
target site overlq>. First, finger 3 of C7 (RSD-E-RKR) (SEQ ID NO: 3) binding to the 
subsite S'-GCG-3* was exchanged with a domain which did not contain a^artate in 
position 2 (Fig.l). The helix TSQ-N-LVR (SEQ ID NO: 6). previously characterized in 
finger 2 position to bind with hi^ specificity to the triplet S'-GAT-3', seemed a good 
candidate. This 3-finger protein (C7.GAT; Fig. 1), containing finger 1 and 2 of C7 and die 
S'-GAT-3 '-recognition helix in finger-3 position, was analyzed for DNA-binding 
specificity on targets with different finger-2 subsites by multi-target EUSA in comparison 
with the original C7 protein (C7.(jCG). Both proteins bound to the 5'-TGG-3' subsite 
(note that C7.GCG binds also to 5'-G(}G'-3' due to the 5' specification of thymine or 
guanine by Asp^ of finger 3 which has been reported earlier. 

Hie recognition of the 5' nucleotide of the finger-2 subsite was evaluated using a 
mixture of all 16 5*-XNN-3' target sites (X = adenine, guanine, cytosine or thymine). 
Indeed, while the original C7. GCG protein specified a guanine or thymine in the 5* 
position of finger 2, C7.GAT did not specify a base, indicating that the cross-subsite 
interaction to the adenine complementaiy to the 5' thymine was abolished. A similar effect 
has previously been reported for variants of Zi£268 where A^^ was replaced by Ala^ by 
site-directed mutagenesis [Isalan et al„ (1997) Proc Natl Acad Sci USA 94(11), 5617- 
5621; Dreieretal., (2000)7. Mo/. Bto/. 303, 489-502]. The affinity of C7.GAT, measured 
by gel mobility shift analysis, was found to be relative low, about 400 nM compared to 0.5 
nM for C7.GCG [Segal et al., (1999) Proc Nail Acad Sd USA 96(6). 2758-2763], which 
may in part be due to the lack of the Asp^ in finger 3. 
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Based on ihe 3-finger protein C7.GAT, a libraiy was constructed in the phage 
display vector pComb3H [BaAas et al., (1991) Proc, Natl Acad. Sci. USA 88, 7978-7982; 
Rader et al., (1997) Curr. Opin. Biotechnol 8(4), S03-S08]. Randomization involved 
positions -1, 1, 2, 3, 5, and 6 of the a-helix of finger 2 using a VNS codon doping strategy 
(V = adenine, cytosine or guanine, N - adenine, cytosine, guanine or thymine, S - cytosine 
or guanine). This allowed 24 possibilities for each randomized amino acid position, 
whereas the aromatic amino acids Tip, Phe, and Tyr, as well as stop codons, were excluded 
in this strategy. Because Leu is predominately found in position 4 of the recognition 
helices of zinc finger domains of the type Cys2-His2 diis position was not randomized. 
After transformation of the library into ER2S37 cells (New England Biolabs) the library 
contained l.S x lO' membm. This exceeded the necessary library size by 60-fold and was 
sufficient to contain all amino acid combinations. 

Six roimds of selection of zinc finger-displaying phage were performed binding to 
each of the sixteen S*-GAT-ANN-GCG-3' biotinylated hairpin target oligonucleotides, 
respectively, in the presence of non-biotinylated competitor DNA. Stringency of the 
selection was increased in each round by decreasing the amount of biotinylated target 
oligonucleotide and increasing amounts of the competitor oligonucleotide mixtures. In the 
sixth round the target concentration was usually 18 nM, S'-CNN-3\ S'-GNN-3% and 5'- 
TNN-3' competitor mixtures were in S-fold excess for each oligonucleotide pool, 
respectively, and the specific S*-ANN-3' mixture (excluding the target sequence) in 10-foId 
excess. Phage binding to the biotinylated target oligonucleotide was recovered by capture 
to streptavidin-coated magnetic beads. 

Clones were usually analyzed after the sixth round of selection. The amino acid 
sequences of selected finger-2 helices were determined and generally showed good 
consovation in positions -1 and 3 (Fig. 2), consistent with previously observed amino acid 
residues in these positions [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 27S8- 
2763]. Position -1 was Gbi when the 3' nucleotide was adenine, with the exception of 
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domains binding 5'-ACA-3' (SPA-D-LTN) (SEQ JD NO: 77) where a Ser was strongly 
selected. Triplets containing a 3' cytosine selected Asp*' (exceptions were domains binding 
5'-AGC-3' and S'-ATW), a 3' guanine Arg\ and a 5' thymine Thr"* and His'*. The 
recognition of a 3' thymine by His'* has also been observed in finger 1 of TKK binding to 
5'-GAT-3' (fflS-N-FCR) (SEQ ID NO: 78); [Fairall et al., (1993) Nature (London) 
366(6454), 483-7]). For the recognition of a middle adenine, Asp and Thr were selected in 
position 3 of the recognition helix. For binding to a middle cytosine, an Asp^ or Thr^ was 
selected, for a middle guanine, His^ (an exception was recognition of 5'-AGT-3', which 
may have a different binding mechanism due to the unusual amino acid residue His"*) and 
for a middle thymine, Ser^ and Alal Note also that the domains binding to 5*-ANG-3* 
subsites contain Asp^ which likely stabilizes the interaction of the 3-finger protein by 
contacting the complementary cytosine of the 5* guanine in die finger-1 subsite. Even 
though there was a predominant selection of Arg and Thr in position S of the recognition 
helices, positions 1, 2 and 5 were variable. 

The most interesting observation was the selection of amino acid residues in 
position 6 of the a-helices that determines binding to the S' nucleotide of a 3 bp subsite. In 
contrast to the recognition of a 5' guanine, where the direct base contact is achieved by Arg 
or Lys in position 6 of the helix, no direct interaction has been observed in protein/DNA 
complexes for any other nucleotide in the 5' position [Ekod-Erickson et al., (1996) 
Structure 4(10), 1 171-1 180; Pavletich et al., (1993) Science (Washington, D, C, J 883-) 
261(5129), 1701-7; Kim et al., (1996) Nat Struct Biol 3(1 1), 940-945; Fairall et al, (1993) 
Nature (London) 366(6454), 483-7; Houbaviy et al., (1996) Proc Natl Acad Sci USA 
93(24), 13577-82; Wuttke et al., (1997) JMol Biol 273(1), 183-206; Nolte et al, (1998) 
Proc Natl Acad Sci USA 95(6), 2938-2943]. Selection of domains against finger-2 
subsites of (he type 5*-GNN-3' had previously generated domains containing only Arg^ 
which directly contacts the 5' guanine [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 
2758-2763]. However, unlike the results for 5*-GNN-3* zinc finger domains, selections of 
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the phage display library against finger-2 subsites of the type 5'-ANN-3* identified 
domains containing various amino acid residues: Ala*, A^g^ Asn^ Asp*, Gin*, Glu*, Thr* 
or Val* (Fig. 2). In addition, one domain recognizing 5 '-TAG-3 ' was selected fi^om this 
library with the amino acid sequence RED-N-LHT (Fig. 3z) (SEQ ID NO: 71). Thr* is also 
present in finger 2 of Zif 268 (RSD-H-LTT) (SEQ ID NO: 79) binding 5'-TGG-3' for 
which no direct contact was observed in the Zif 268/DNA complex. 

Finger-2 variants of C7.GAT were subcloned into bacterial expression vector as 
fiision with maltose-binding protein (MB?) and proteins were expressed by induction with 
1 mM IPTG (proteins (p) are given the name of the finger-2 subsite against which they 
were selected) . Proteins were tested by enzyme-linked immunosorbant assay (ELIS A) 
agamst each of the 16 finger*2 subsites of the type S'-GAT ANN (jCG-3' to investigate 
their DNA-binding specificity (Fig. 3, black bars). In addition, the S*-nucleotide 
recognition was analyzed by exposing zinc finger proteins to the specific target 
oligonucleotide and three subsites which differed only in the S'-nucleotide of the middle 
triplet. For example, pAAA was tested on 5'-AAA-3', 5'-CAA-3', 5'-GAA-3*, and 5'- 
TAA-3 ' subsites (Fig. 3, white bars). Many of the tested 3-finger proteins showed 
exquisite DNA-binding specificity for the finger-2 subsite against they were selected. 
Binding properties of domains which were boxed in Fig. 2 and are considered the most 
specific binders of each set are represented in the upper panel of Fig. 3, while additional 
domains tested (marked with an asterisk in Fig. 2) are summarized in the lower panel of 
Fig. 3. The exception were pACiC and pATC whose DNA binding was too weak to be 
detected by EUSA. The most promising helix for pAGC (DAS-H-LHT) (SEQ ID NO: 80) 
i;^ch contained the expected amino acid Asp'^ and His^ specifying a 3' cytosine and 
middle guanine, but also a Thx^ not selected in any other case for a 5' adenine, was 
analyzed without detectable DNA binding. 

To analyze a larger set, the pool of coding sequences for pAGC was subcloned into 
the plasmid pMal after the sixth round of selection and 18 individual clones were tested for 

14 



wo 02/066640 



PCT/EP02/01862 



DNA-binding specificity, of which none showed measurable DNA-binding in EUSA. In 
the case of pATC, two helices (RRS-S-CRK and RRS-A-C31R) (SEQ ID NOs: 80, 81) 
were selected containing a Leu^ to Cys^ mutation, for which no DNA binding was 
detectable. Rational design was implied to find domains bindmg to S^AGC-3' or S**ATC- 
3', since no proteins binding these finger-2 sufosites were generated by phage display. 
Finger-2 mutants viere constructed based on the recognition helices which were previously 
demonstrated to bind specifically to S*-GGC-3' (ERS-K-LAR (SEQ ID NO: 82), DPG-H- 
LVR (SEQ ID NO: 83)) and 5*-GTC-3' (DPG-A-LVR) (SEQ ID NO: 84) (Segal et al., 
(\999)Proc Natl Acad Sci USA 96(6), 2758-2763]. ForpAGC two proteins were 
constracted (ERS-K-IAA (SEQ ID NO: 85), DPG-H-LRV (SEQ ID NO: 86)) by simply 
exchanging position 5 and 6 to a 5* adenine recognition motif RA or RV (Fig. 3a, 3b and 
3i). DNA binding of these proteins was below detection level. Jn the case of pATC two 
finger-2 mutants containing a RV motif (Fig. 3b) were constructed (DPG-A-LRV (SEQ ID 
NO: 87), DPG-S-LKV (SEQ ID NO: 88)). Both proteins bound DNA with extremely low 
affinity regardless if position 3 was Ala or Ser. 

Analysis of the 3-finger proteins on the sixteen finger-2 subsites by EUSA revealed 
that some finger-2 domains bound best to a target they were not selected against. First, the 
predominantly selected helix for 5'-AGA-3' was RSD-H-LTN (SEQ ID NO: 63), which in 
&ctbound5*-A(jG-3'(Fig. 3r). This can be explained by the Arg in position -1. In 
addition, this protein showed a better discrimination of a 5' adenine compared to the 
predominantly selected helix pAGG (RSD-H-LAE (SEQ ID NO: 55); Fig. 3j). Second, a 
heUx binding specificaUy to 5'-AAG.3' (RSD-N-LKN (SEQ ID NO: 61); Fig. 3p) was 
actually selected against 5'-AAC-3* (Fig. 2), and bound more specific to the finger-2 
subsite 5'-AAG-3' than pAAG (RSD-T-LSN (SEQ ID NO: 48); Fig. 3c), which had been 
selected in the 5*-AAG-3* set. In addition, proteins directed to target sites of the type 5'- 
ANG*3* showed cross reactivity with all four target sites of the type 5*-ANG-3\ except for 
pAGG (Fig. 3j and3r). The recognition of a middle purine seems more restrictive than of a 
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middle pyrimidine, because also pAAG (RSD-N-LKN (SEQ ID NO: 61); Fig. 3p) had only 
moderate cross-reactivity. 

In comparison, the proteins pACG (RTD-T-LRD (SEQ ID NO: 52); Fig 3g) and 
pATG (RRD-A-LNV (SEQ ID NO: 58); Fig. 3m) show cross-reactivity with all 5'-ANG-3* 
subsites. The recognition of a middle pyrimidine has been reported to be difficult in 
previous studies for domains binding to 5*-GNG-3' DNA sequences [Segal et al., (1999) 
Proc Natl Acad Sci USA 96(6), 2758-2763; Dreier et al., (2000) / MoL Biol 303, 489- 
502]. To improve the recognition of the middle nucleotide, finger-2 mutants containing 
different amino acid residues in position 3 were generated by site-directed mutagenesis. 
Binding of pAAG (RSD-I-LSN (SEQ ID NO: 48), Fig. 3c) was more specific for a middle 
adenine after a Thr^ to Asn^ mutation (Fig. 3o). The binding to 5'-ATG-3' (SRD-A-LNV 
(SEQ ID NO: 58); Fig. 3m) was improved by a single amino acid exchange Ala^ to Gki^ 
(Fig. 3w), while a Thi^ to Asp^ or Gb^ mutation for pACG (RSD-T-LRD (SEQ ID NO: 
52); Fig. 3g) abolished DNA binding, addition, the recognition helix pAGT (HRT-T- 
LIN (SEQ ID NO: 56); Fig. 3k) showed cross-reactivity for the middle nucleotide which 
was reduced by a Leu^ to Thi^ substitution (Fig. 3s). Surprisingly, improved 
discrimination for the middle nucleotide was often associated with some loss of specificity 
for the recognition of the 5' adoiine (compare Fig. 3o-3p, 3m-3w, 3k-3s). 

Selection of zinc finger domains binding to subsites containing a 5' adenine or 
cytosine fix>m die previously described finger-2 library based on the 3-finger protein C7 
[Segal et al.» (1999) Proc Natl Acad Sci USA 96(6), 2758-2763] was not suitable for the 
selection of zinc-finger domains due to the limitation of aq)artate in position 2 of finger 3 
which makes a cross-subsite contact to the nucleotide complementary of the S' position of 
the finger-2 subsite (Fig. 1). We eliminated this contact by exchanging fing^ 3 with a 
domain lacking Asp^. Fmger 2 of C7.GAT was randomized and a phage di^lay library 
constructed. In most cases* novel 3-finger proteins were selected binding to finger-2 
subsites of the type 5'-ANN-3'. For the subsites 5'-AGC-3* and 5*-ATC-3* no tight 
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binders wore identified. This was not expected, because the domains binding to the subsite 
5*-GGC-3' and 5*-fiTC-3' previously selected fiom the C7-based phage display libraiy 
showed excellent DNA-binding specificity and affinity of 40 nM to their target site [Segal 
et al„ (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. One simple explanation would 
be the limiting randomization strategy by the usage of VNS codons which do not include 
the aromatic amino acid residues. These were not included in the libraiy, because for the 
domains binding to S'-GNN-3' subsites no aromatic amino acid residues were selected, 
even though tiiey were included in the randomization strategy [Segal et al., (1999) Proc 
Natl Acad SciUSA 96(6), 2758-2763]. However, there have been zinc finger domains 
rq>oited containing aromatic residues, like finger 2 of CFII2 (VKD-Y-LTK (SEQ ID NO: 
89); [Gogos et al., (1996) PNAS93, 2159-2164]), finger 1 of TFBIA (KNW-K-LQA (SEQ 
ID NO: 90;[Wuttkeetal.,(1997)yAfo/Sirf273(l), 183-206]), finger 1 of TTK(fflS-N- 
FCR (SEQ ID NO: 78); [FaiiaU et al., (1993) Nature (London) 366(6454), 483-7]) and 
finger 2 of GU (AQY-M-LW (SEQ ID NO: 91); [Pavletich et al., (1993) Science 
(Washington, D. C, 1883-) 261(5129), 1701-7]). Aromatic amino acid residues might be 
important for the recognition of the subsites 5'-AGC-3' and 5'-ATC-3'. 

hi recent years it has become clear that the recognition helix of Cy52-His2 zinc 
finger domains can adopt different orientations relative to the DNA in order to achieve 
optimal binding [Pabo et al., (2000)7. Mol. Biol. 301, 597-624]. However, the orientation 
of the helix in this region may be partially restricted by the firequently observed interaction 
involving the zinc ion, His^ and the phosphate backbone. Furthennore, comparison of 
binding properties of interactions in protein/DNA complexes have led to the conclusion 
that the C-a atom of position 6 is usually 8.8 ± 0.8A apart from the nearest heavy atom of 
the 5' nucleotide in the DNA subsite, which favors only the recognition of a 5' guanine by 
Arg^ or Lys^ [Pabo et al., (2000) J. MoL BioL 301, 597-624]. To date, no interaction of any 
other position 6 residue with a base other than guanine has been observed in protein/DNA 
complexes. For example, finger 4 of YYl (QST-N-UCS) (SEQ ID NO: 92) recognizes 5'- 

17 



wo 02/066640 



PCT/EP02/01862 



CAA-3' but there was no contact observed between Ser^ and the 5' cytosine [Houbaviy et 
si., il996)Proc Natl Acad SciUSA93i24), 13577-82]. Further, in the case of Thr*^ in 
finger 3 of YYl (LDF-N-LRT) (SEQ ID NO: 93), recognizing 5'-ATT-3\ and in finger 2 
of Zif268 (RSD-H-LTD (SEQ ID NO: 79), specifying 5'-I/GGG-3', no contact with the 5' 
nucleotide was observed [Houbaviy et a!., (1996) Proc Nail Acad Sci U SA 93(24), 13577- 
82; Eliod-Erickson et al., (1996) Structure 4(10), 1 171-1 180]. Finally, Ala^ of finger 2 of 
tramtrack (RKD-N-MTA) (SEQ ID NO: 94) binding to the subsite 5*-AAG-3' does not 
contact the 5' adenine [Fairall et al., (1993) Nature (London) 366(6454), 483-7]. 

Amino acid residues Ala^ Val^ Asn^ and even Argf , which in a different context 
was demonstrated to bind a 5' guanine efficiently [Segal et al., (1999) Proc Natl Acad Sci 
USA 96(6), 2758-2763], were predominantly selected fiom the C7.GAT libr^ for DNA 
subsites of the type S'-ANN-3' (Fig. 2). h addition, position 6 was selected as Thr, Glu 
and Asp depending on the finger-2 target site. This is consistent with early studies fifom 
other groups where positions of adjacent fingers were randomized [Jamieson et al., (1996) 
Proc Natl Acad Sci USA 93, 12834-12839; Isalan et al.. (1998) Biochemistry 37(35), 
12026-12033]. Screening of phage display libraries had resulted in selection of amino acid 
residues Tyr, Val, Thr, Asn, Lys, Glu and Leu, as well as Gly, Ser and Arg, but not Ala, for 
the recognition of a 5' adenine. In addition, using a sequential phage display selection 
strategy several domains binding to 5'-ANN-3' subsites were identified and specificity 
evaluated by target site selections. Arg, Ala and Thr in position 6 of the helix were 
demonstrated to recognize predominantly a 5' adenine [Wolfe et al., (1999) Annu. Rev, 
Biopkys. Biomol Struct. 3, 183-212]. 

hi addition, Thi^ specifies a 5* adenine as shown by target site selection for fing^ 5 
of Gfi4 (QSS-N-LTD (SEQ ID NO: 95) binding to the subside 5'-AAA-3' [Zwddler- 
McKay et al., (1996) MoL Cell. BioL 16(8), 4024-4034]. These examples, including the 
present results, indicate that there is likely a relation between amino acid residue in position 
6 and the 5' adenine, because they are frequently selected This is at odds with data fiom 
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crystallographic studies, that never showed interaction of position 6 of the _-helix with a 5' 
nucleotide except guanine. One simple explanation might be that short amino acid 
residues, hke Ala, Val, Thr, or Asn are not a sterical hindrance in the binding mode of 
domains recognizing S*-ANN-3' subsites. This is supported by results gathered by site- 
directed mutagenesis in position 6 for a helix (QRS-A-LTV) (SEQ ID NO: 96) binding to 
a 5'-G/ATA-3' subsite [Gogos et al., (1996) 93, 2159-2164]. Rq)laccment of Val** 
with Ala^ which were also found for domains described here, or Lys^ had no affect on tibe 
binding specificity or affinity. 

Computer modeling was used to investigate possible interactions of the frequently 
selected Ala^ Asn^ and Arg^ with a S' ad^e* Analysis of the interaction from Ala^ in the 
helix binding to 5'-AAA-3' (QRA-N-LRA; Fig. 3a) (SEQ ID NO: 46) with a 5' adenine 
was based on the coordinates of the protein/DNA complex of finger 1 (QSG-S-LTR) (SEQ 
ID NO: 97) from a Zif268 variant If Gin ' and Asn^ of QRA-N-LRA (SEQ ID NO: 98) 
hydrogen bond with their respective adenine bases in the canonical way, these interactions 
should fix a distance of about 8 A between the methyl group of Ala^ and the S' adenine and 
more than 1 1 A between the methyl groups of Ala^ and the thymine base-paired to the 
adenine, suggesting also that no direct contact can be proposed for Val^ and Thr^. 

Interestingly, the expected lack of 5' specificity by short amino acids in position 6 
of the a-helix is only partially supported by the binding data. Helices such as RRD-A-LNV 
(SEQ ID NO: 58) (Fig. 3m) and the finger.2 helix RSD-H-LTT (SEQ ID NO: 5) of 
C7.GAT did indeed show essentially no 5' specificity. However, helix DSG-N-LRV (SEQ 
ID NO: 47) (Fig. 3b) displayed excellent specificity for a S' adenine, while TSH-G-LTT 
(SEQ ID NO: 70) (Fig. 3y) was specific for S' ad^iine or guanine. Other helices with short 
position-6 residues displayed varymg degrees of 5' specificity, with the only obvious 
consistency being that 5* thymine was usually excluded (Fig. 3). Since it is unlikely that 
die position-6 residue can make a direct contribution to specificity, the observed binding 
patterns must derive fix>m another source. Possibilities include local sequence-specific 
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DNA structure and overlapping interactions from neighboring domains. The latter 
possibility is disfavored, however, because the residue in position 2 of finger 3 (which is 
frequently observed to contact the neighboring site) is glycine in the parental protein 
C7.GAT, and because 5' thymine was not excluded by the two helices mentioned above. 

Asparagine was also frequently selected in position 6. Helix HRT-T-LTO (SEQ ID 
NO: 56) (Fig. 3k) and RSD-T-LSN (SEQ ID NO: 48) (Fig. 3c) displayed excellent 
specificity for S' adenine. However, Asn^ also seemed to impart specificity for both 
adenine and guanine (Fig. 3n, 3p and 3r), suggesting an interaction with the N7 common to 
both nucleotides. Computer modeling of the helix binding to 5'-AGG-3' (RSD-H-LTO 
(SEQ ID NO: 90); Fig. 3r), based on the coordinates of finger 2, binding to 5'-TGG-3\ in 
the ZiC68/DNA crystal stmcture (RSD-H-LTT (SEQ ID NO: 79); [Elrod-Erickson et al., 
(1996) Structure 4(10), 1171-1 180]), suggested that the N-5 of Asn^ would be 
approximately 4.SA from N7 of the 5' adenine. A modest reorientation of the _-helix 
which is considered within the range of canonical docking orientations [Pabo et al., (2000) 
y. MoL Biol 301, S97-624], could plausibly bring the N-5 within hydrogen bonding 
distance, analogous to the reorientation observed when glutamate rather than arginine 
appears in position -1 . However, it is interesting to speculate why Asn^ was selected in 
this 5'-ANN-3' recognition set while the longer Ghi^ was not. Gbl^ being more flexible, 
may have been able to stabilize other interactions that wm selected against during phage 
display. Alternatively, the shorter side chain of Asn^ might accommodate an ordered water 
molecule that could contact the 5* nucleotide without reorientation of the helix. 

The final residue to be considered is Arg^ It was somewhat surprising that Arg^ 
was selected so fi^uently on 5'-ANN-3' targets because in our previous studies, it was 
unanimously selected to recognize a 5' guanine with high specificity [Segal et al., (1999) 
Proc Natl Acad Sci USA 96(6), 2758-2763]. However, in the current study, Arg* 
primarily specified S' adenine (Fig. 3e, f; h and v), in some cases in addition to recognition 
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of a 5* guanine (Fig. 3t and u) . Computer modeling of helix binding to 5'-ACA-3' (SPA- 
D-LTR (SEQ ro NO: 50); Fig. 3e), based on the coordinates of finger 1 QSG-S-LTR (SEQ 
ID NO: 98) of aZif 268 variant binding 5'-GCA-3' [Elrod-Erickson et al., (1998) Structure 
6(4), 451-464], suggested that Arg^ could easily adopt a configuration that allowed it to 
make a cross-strand hydrogen bond to 04 of a thymine base-paired to 5* adenine. In fact, 
Arg^ could bind with good geometry to both the 04 of thymme and 06 of a guanine base- 
paired to a middle cytosine. Such an interaction is consistent with the fact that Arg^ was 
selected abnost unanimously when the target sequence was 5'-ACN-3'. The expectation 
for arginine to fiicilitate multiple intoactions is compelling. Several lysines in TFIIIA were 
observedby NMR to be conformadonally flexible [Foster et al., (1997) Nat Struct, Biol 
4(8), 605-608], and Gbi'^ behaves in a manner which suggests flexibility [Dreia: et al., 
(2000) J. MoL Biol 303, 489-502]. Aiginine has more rotable bonds and more hydrogen 
bonding potential than lysine or glutamine and it is attractive to speculate that Arg^ is not 
limited to recognition of 5' guanine. 

Amino acid residues in positions -1 and 3 were generally selected in analogy to 
their S'-GNN-3* counterparts with two excq>tions. His'* was selected for pAGT and 
pATT, recognizing a 3' thymine (Fig. 3k, 3n and 3y), and Ser'' for pACA, recognizing a 3' 
adenine (Fig. 3e and 3t). While Gb'^ was frequently used to specify a 3' adenine in 
subsites of the type 5*GNN-3', a new elem^t of 3' adenine recognition was suggested 
from this study involving Ser'' selected for domains recognizing the 5'-ACA-3' subsite 
(Fig. 2) viiich can make a hydrogen bond with the 3 ' adenine. Computer modelmg 
demonstrates that Ala^ co-selected in the helix SPA-D-LTR (SEQ ID NO: 50) (Fig. 3e), 
can potentially make a van der Waals contact with the methyl group of the thymine based- 
paired to 3* adenine. The best evidence that Ala^ might be involved is that helix SPA-D- 
LTR (SEQ ID NO: 50) (Fig. 3e) is strongly q)ecific for 3' adenine while SH^-D-LVR 
(SEQ ID NO: 65) (Fig. 3t) is not Gto'' is often sufficient for 3' adenine recognition. 
However, data from our previous studies suggested that the side chain of Gin'' can adopt 
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multiple confonnations, enabling, for example, recognition of 3' thymine [Nardelli et al., 
(1992) Nucleic Acids Res. 20(16), 4137-44; Elrod-Erickson et al, (1998) Structure 6(4), 
451-464; Dreieretal.,(2000)j:Afo/. 5/0/, 303,489-502]. Ala^ in combination with Sct* 
maybe an alternative means to specificity a 3' adenine. 

Another interaction not observed in the 5'-GNN-3' study is the cooperative 
recognition of 3* thymine by His*^ and the residue at position 2. In finger 1 of the crystal 
structure of the tramtrack/DNA complex, helix HIS-N-FCR (SEQ ID NO: 99) binds the 
subsite 5'-GAT-3' [Fairall et al., (1993) Nature (London) 366(6454), 483-7]. The His ^ 
ring is perpoidicular to the plane of the.3* thymine base and is q)proximately 4A from the 
methyl group. Sex^ additionally makes a hydrogen bond with 04 of 3' thymine. A similar 
set of contacts can be envisioned by computer modeling for the recognition of 5'*ATT-3' 
by heUx HKN-A-LQN (SEQ ID NO: 100) (Fig. 3n). Asn^ in this helix has the potential not 
only to hydrogen bond widi 3* thymine but also with the adenine base-paired to thymine, 
ffis ' was also found for the heHx bmding 5'-AGT-3* QIRT-T-LLN (SEQ ID NO: 98); Fig. 
3k) in combination with a Thr^. Thr is structurally similar to Ser and might be involved in 
a similar recognition mechanism. 

In conclusion, the results of the characterization of zinc finger domains reported in 
this study binding 5'-ANN-3* DNA subsites is consistent with the overall view that there is 
no genial recognition code, which makes rational design of additional domains difficult. 
However, phage display selections can be applied and pre-defined zinc finger domains can 
sorve as modules for the construction of artificial transcription factors. The domains 
characterized here oiables targeting ofDNA sequences other than S'-(GNN)6-3'. This is 
an important supplement to existing domains, since G/C-rich sequences often contain 
binding sites for cellular proteins and 5'(GNN)6-3' sequences may not be found in all 
promoters. 

n. Polynucleotides, Expression Vectors and Transformed Cells 
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The invention includes a nucleotide sequence encoding a zinc finger-nucleotide 
binding polypeptide. DNA sequences encoding the zinc finger-nucleotide binding 
polypeptides of the invention, including native, truncated, and expanded polypeptides, can 
be obtained by several methods. For example, the DNA can be isolated using hybridization 
procedures which are well known in the art. These include, but are not limited to: (1) 
hybridization of probes to genomic or cDNA libraries to detect shared nucleotide 
sequences; (2) antibody screening of expression libraries to detect shared stmctural 
features; and (3) synthesis by the polymerase chain reaction (PGR). RNA sequences of the 
invention can be obtained by methods known in the art (See, for example. Current 
Protocols in Molecular Biologv> Ausubel, et al.Eds., 1989). 

The development of specific DNA sequences encoding zinc finger-nucleotide 
binding polypeptides of the invention can be obtamed by: (1) isolation of a double-stranded 
DNA sequence from the genomic DNA; (2) chemical manufacture of a DNA sequence to 
provide the necessary codons for the polypq)tide of intmst; and (3) in vitro synthesis of a 
double-stranded DNA sequence by reverse transcription of mRNA isolated from a 
eukaiyotic donor cell. In the latter case, a double-stranded DNA complement of mRNA is 
eventually fomied which is generally referred to as cDNA. Of these three methods for 
developing specific DNA sequences for use in recombinant procedures, the isolation of 
genomic DNA is the least conmion. This is especially true when it is desirable to obtam 
the microbial expression of mammalian polypq)tides due to the presence of introns. 

For obtaining zinc finger daived-DNA binding polypeptides, the synthesis of DNA 
sequences is frequently the method of choice v/bea the entire sequence of amino acid 
residues of the desired polypeptide product is known. When the entire sequence of amino 
acid residues of the desired polypeptide is not known, the direct synthesis of DNA 
sequences is not possible and the method of choice is the formation of cDNA sequences. 
Among the standard procedures for isolating cDNA sequences of interest is the formation 
of plasmid-carrying cDNA libraries which are derived bom reverse transcription of mRNA 
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which is abundant in donor cells that have a high level of genetic expression. When used 
in combination with polymerase chain reaction technology, even rare expression products 
can be clones. In those cases where significant portions of the amino acid sequence of the 
polypeptide are known, the production of labeled single or double-stranded DNA or RNA 
probe sequences duplicating a sequence putatively present in the target cDNA may be 
employed in DNA/DNA hybridization procedures which are carried out on cloned copies 
of the cDNA which have been denatured into a single-stranded foim (Jay, et al., Nucleic 
AcidResearch 11:2325, 1983). 

A polypeptide of this invention can be operatively linked to one or more functional 
peptides. Such functional pq)tides are well known in the art and can be a transcription 
regulating factor such as a repressor or activation domain or a peptide having other 
functions. Exemplary and preferred such functional peptides are nucleases, methylases, 
nuclear localization domains, and restriction enzymes such as endo- or ectonucleases (See. 
e.g.. Chandrasegaran and Smith, Biol Chem., 380:841-848, 1999). 

An exanplary rq>ression domain peptide is the ERF repressor domain (ERD) 
(Sgouras, D. N., Atfaanasiou, M. A., Beat, G. J., Jr., Fisher, R. J., Blair, D. G. & 
Mavrothalassitis, G. J. (1995) BMBO J. 14, 4781-4793), defined by amino acids 473 to 530 
of the ets2 repressor factor (ERF). This domain mediates the antagonistic effect of ERF on 
the activity of transcription factors of the ets family. A synthetic repressor is constructed by 
fusion of this domain to the N- or C-terminus of the zinc finger protein. A second repressor 
protein is prepared using the KiQppel-associated box (KRAB) domain (Margolin, J. F., 
Friedman, J. R., Meyer, W., K.-H., Vissing. H., Thiesen, H.-J. & Rauscher ffl. R J. (1994) 
Pioc. Natl. Acad. Sci. USA 91, 4509-4513). This repressor domain is conmionly found at 
the N-teiminus of zinc finger proteins and presumably exerts its repressive activity on 
TATA-dependent transoiption in a distance- and orientation-independent manner (Pengue, 
G. & Lania, L. (1996) Proc. Natl Acad. Sci. USA 93, 1015-1020), by interacting with the 
RING finger protein KAP-1 (Friedman, J. R., Fredaicks, W. J., Jensen, D. E., Speicher, D. 
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W., Huang, X.-P., Neilson, E. G. & Rauscher ffl, F. J. (1996) Genes & Dev. 10, 2067- 
2078). We utilized the KRAB domain found between amino acids 1 and 97 of the zinc 
finger protein KOXl (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H., 
Thiesen, H.-J. & Rauscher m, F. J. (1994) Proc. Natl. Acad. Sci. USA 91, 4509-4513). In 
this case an N-terminal fusion with a zinc-finger polypeptide is constructed. Finally, to 
explore the utility of histone deacetylation for repression, amino acids 1 to 36 of the Mad 
mSIN3 interaction domain (SID) are fused to the N-tenninus of the zinc finger protein 
(Ayer, D. E., Laherty, C. D., Lawrence, Q. A., Armstrong, A. P. & Eisenman, R. N. (1996) 
Mol. Cell. Biol 16, 5772-5781). This small domain is found at the N-terminus of the 
transcription factor Mad and is responsible for mediating its transcriptional repression by 
interacting with mSIN3, which in turn interacts the co-repressor N-CoR and with the 
histone deacetylase mRPDl (Heinzel, T., Lavinsky, R. M.» Mullen, T.-M., SSderstrSm, M., 
Laherty, C. D., Toichia, J., Yang, W.-M., Brand, G., Ngo, S. D. & al., e. (1997) Nature 387, 
43-46). To examine gene-specific activation, transcriptional activators are generated by 
fusing the zinc finger polypeptide to amino acids 413 to 489 of the hetpes simplex virus 
VP16 protein (Sadowski, Ma, J., Triezenberg, S. & Ptashne, M. (1988) Nature 335, 563- 
S64X or to an artificial tetrameric repcBi of VP16*s minimal activation domain, (Seipel, K., 
Georgicv, 0. & Schaffiier, W, (1992) EMBO 111, 4961-4968), teraied VP64. 

nL Pharmaceutical Compositions 

In another aspect, the present invention provides a pharmaceutical composition 
comprising a therapeutically effective amount of a zinc finger-nucleotide bmding 
polypeptide or a thenq>eutically efTective amount of a nucleotide sequence that encodes a 
zinc finger-nucleotide binding polypeptide in combination with a phannaceutically 
accq>table carrier. 

As used herein, the terms "pharmaceutically acceptable" '^physiologically 
tolerable" and grammatical variations thereof, as they icfet to compositions, carriers, 
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diluents and reagents, are used interchangeable and represent that the materials are capable 
of administration to or upon a human without the production of undesirable physiological 
effects such as nausea, dizziness, gastric upset and the like which would be to a degree that 
would prohibit administration of the composition. 

The preparation of a pharmacological composition that contains active ingredients 
dissolved or dispersed therem is well understood in the art Typically such compositions 
are prepared as sterile injectables either as liquid solutions or suspensions, aqueous or non- 
aqueous, however, solid forms suitable for solution, or suspensions, in liquid prior to use 
can also be prepared. The prq>aration can also be emulsified. 

The active ingredient can be mfaced with excipients which are pharmaceutically 
acceptable and compatible with the active ingredient and in amounts suitable for use in the 
therapeutic methods described herein. Suitable ^cipients are, for example, water, saline, 
dextrose, glycerol, ethanol or fbt like and combinations thereof. In addition, if desired, the 
composition can contain minor amounts of auxiliary substances such as wetting or 
emulsifying agents, as well as pH buffering agents and the like which enhance the 
effectiveness of the active ingredient 

He therapeutic phaimaceutical composition of the present invention can include 
pharmaceutically acceptable salts of the compon^ts therein. Pharmaceutically acceptable 
salts include the acid addition salts (formed with the fiee amino groups of the polypeptide) 
that are formed with inorganic acids such as, for example, hydrochloric or phosphoric 
acids, or such organic acids as acetic, tartaric, mandelic and the like. Salts formed with the 
fiiee caiboxyl groups can also be derived fiom inorganic bases such as, for example, 
sodium, potassium, ammonium, calcium or ferric hydroxides, and such organic bases as 
isopropylamine, tiimetfaylamine, 2-etfaylamino ethanol, histidine, procaine and the like. 

Physiologically tolerable carriers are well known in the art. Exemplary of liquid 
carriers are stoile aqueous solutions that contain no materials in addition to the active 
ingredients and water, or contain a buffer such as sodium phosphate at physiological pH 

26 



WO02/0(>6640 



PCT/EP02/01862 



value, physiological saline or both» such as phosphate-bufTered saline. Still forther, 
aqueous carriers can contain more than one buiTer salt, as well as salts such as sodium and 
potassium chlorides, dextrose, propylene glycol, polyethylene glycol and other solutes. 
Liquid compositions can also contain liquid phases in addition to and to the exclusion of 
water. Exemplary of such additional liquid phases are glycerin, vegetable oils such as 
cottonseed oil, organic esters such as ethyl oleate, and water-oil emulsions. 

IV. Us^ 

Li one embodiment, a method of the inv^tion includes a process for modulating 
(inhibiting or suppressing) expression of a nucleotide sequence comprising a zinc finger- 
nucleotide binding motif, which method includes the step of contacting the zinc finger- 
nucleotide binding motif with an effective amount of a zinc finger-nucleotide binding 
polypeptide that binds to the motif. In the case where the nucleotide sequence is a 
promoter, the method includes inhibiting the transcriptional transactivation of a promoter 
containing a zinc finger-DNA bmding motif. The term ''inhibiting" refers to the 
suppression of the level of activation of transcription of a structural gene operably linked to 
a promoter, containing a zinc finger-nucleotide binding motif, for example. In addition, the 
zinc finger-nucleotide binding polypq)tide derivative may bind a motif within a structural 
gene or within an RNA sequence. 

The term ''effective amount" includes that amount which results in the deactivation 
of a previously activated promoter or that amount which results in the inactivation of a 
promoter containing a zinc finger-nucleotide binding motif, or that amount which blocks 
transcription of a structural gene or translation of RNA. The amount of zinc finger 
derived-nucleotide binding polypeptide required is that amount necessary to either displace 
a native zinc fingor-nucleotide binding protein in an existing protein/promoter complex, or 
that amount necessary to compete witii the native zinc finger-nucleotide binding protem to 
form a complex with the promoter itself. Similariy, the amount required to block a 

27 



wo 02/066640 



PCT/EP02/01862 



Structural gene or RNA is that amount which binds to and blocks RNA polymerase from 
reading through on the gene or that amount which inhibits translation, respectively. 
Preferably, the method is perfonned intracellularly. By functionally inactivating a 
promoter or structural gene, transcription or translation is suppressed. Delivery of an 
effective amount of the inhibitory protein for binding to or '^contacting" the cellular 
nucleotide sequence containing the zinc finger-nucleotide binding protein motif, can be 
accomplished by one of the mechanisms described herein, such as by retroviral vectors or 
liposomes, or other methods well known in the art. 

The term **modulating" refers to the suppression, enhancanent or induction of a 
function. For example, the zinc fing^-nucleotide binding polypq>tide of the invention may 
modulate a promoter sequence by binding to a motif within the promoter, thereby 
enchancing or siq)pressing transcription of a gene operatively linked to the promoter 
nucleotide sequence. Alternatively, modulation may include inhibition of transcription of a 
gene where the zinc finger-nucleotide binding polypq)tide binds to the structural gene and 
blocks DNA depend&at RNA polymerase from reading through the gene, thus inhibiting 
transcription of the gene. The structural gene may be a nomial cellular gene or an 
oncogene, for example. Alternatively, modulation may include inhibition of translation of 
a transcript. 

The promoto- region of a gene mcludes the regulatory elements that typically lie S* 
to a structural gene. If a gene is to be activated, proteins known as transcription factors 
attach to the promoter region of the gene. This assembly resembles an "on switch" by 
enabling an enzyme to transcribe a second g^ietic segment from DNA to RNA. In most 
cases the resulting RNA molecule s^es as a template for synthesis of a specific protem; 
sometimes RNA itself is the final product. 

The promoter region may be a normal cellular promoter or, for example, an onco- 
promoter. An onco-promoter is generally a virus-derived promoter. For exanq)le, the long 
terminal repeat (LTR) of retroviruses is a promote: region which may be a target for a zinc 
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finger binding polypeptide variant of the invention. Promoters torn members of the 
Lentivirus group, which include such pathogens as human T-cell lymphotiophic virus 
(HTLV) 1 and 2, or human immunodeficiency virus (HIV) 1 or 2, are examples of viral 
promoter regions which may be targeted for transcriptional modulation by a zinc finger 
binding polypeptide of the invention. 

To investigate whether the domains described here specifically binding to S'-ANN- 
3' DNA sequences are suitable for the construction of such artificial transcription factors,, 
four 6-finger proteins were assembled containing various numbers of S'-ANN-3' domains. 
For each of (he 6-finger proteins two 3 finger-coding regions were generated by PCR 
overlap ^tension using the SplC firamework [Beerli et al., (1998) Proc Nail Acad Sci US 
A 95(25), 14628-14633]. These 3-finger proteins were then fused to create 6-finger 
proteins via restriction sites (Fig. 4a) and cloned into the bacterial expression vector pMal 
for analysis of DNA-binding specificity and affinity. First, the 6-finger protein pAart was 
constructed, designed to recognize the arbitrary 18 bp target site 5'-AT&TAG-AGA- 
AAA-ACC-AGG-3', which was completely firee of 5*-GNN.3' triplets. Secondly, three 6- 
Sngeac proteins containing both, 5'-GNN-3' and 5'-ANN-3' domains, were constructed. 
The well characterized model of the erbB-2 and erbB'3 genes for which we have 
previously shown that regulation of the endogenous gene was specifically achieved by, 
respectively, the 6-finger protein pE2C or pE3, which bound to S'-(GNN)6-3* DNA 
sequences [Beerli et al., (2000) Proc Natl Acad Sci USA 97(4), 1495-1500; Beerli et al., 
(2000)/. Biol. Chem. 275(42), 32617-32627] were chosen for study. 

The 6-fingCT protein pE2X binding to the target site 5^ACC GGA GAA ACC AGG 
(jGA-3' (SEQ ID NO: 101) in position -168 to -151 in the 5' untranslated region (UTR) of 
&e erbB'2 gene was constructed (Fig. 4a). hi addition, two proteins binding in the 5* UTR 
of the erbB-3 gene were generated. The protein pE3Y bound to the target site 5'-ATC 
GAG GCA AGA GCC ACC-3' (SEQ ID NO: 102) in position -94 to -1 1 1 of the 5' UTR, 
pE3Z in position -79 to -61 recognizing 5*-CJCC GCA GCA QCC ACC AAT-3' (SEQ ID 
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NO: 103) (Fig. 4a). The coding sequence for the four 6-finger proteins were then cloned 
into the bacterial expression vector pMal. Crude extracts containing the zinc finger-MBP 
fusion protein were tested for DNA binding in ELIS A (Fig. 4b). All four proteins show 
exquisite binding specificity to their target DNA with no cross-ieactivity to the other target 
sites tested. The affinities were determined in gel mobility shift assays with purified 
proteins. The protein Aart bound its DNA target site with an affinity of 7.5 pM, pE2X with 
an affinity of IS nM, pE3Y of 8 nM and pE3Z of 2 nM» which is in the range of affinities 
we have observed for most 6-finger proteins analyzed so far. 

To evaluate the potential for specific gene regulation, the protein^oding sequence 
for Aart was cloned into the vector pcDNA and fused to the VP64 activation domain^ a 
tetrameric repeat of the minimal activation domam derived fiom the herpes simplex virus 
protein VP16 [Seipel et al., (1992) EMBO J. 1 1(13), 496M968; Beerli et al., (1998) Proc 
Natl Acad Set USA 95(25), 14628-14633]. HeLa cells were transiently co-transfected 
with the effector constructs coding either only for the zinc finger protein or as fusion with 
the VP64 domain, and a luciferase reporter plasmid under the control of a minimal 
promoter containing the zinc finger-binding site and a TATA-box. The Aart-binding site 
was present in five copies while a promoter used as control contained six 2C7-binding 
sites. The expression of luciferase was up-regulated 2000-fold by the pAart-VP64 fusion 
protein in comparison to the control containing no activation domain (Fig. 5a). Activation 
was specific since no regulation of the reporter containing 6 x 2C7-binding sites was 
observed (Fig. 5b). As additional control for q>ecificity the 6-finger protein p2C7 [Wu et 
al., (1995) PNAS 92, 344-348] was also tested, which only activated luciferase expression 
whoi the promoter contained 6x2C7-binding sites (Fig. 5b), but not when the promoter 
contained the 5xAart-binding (Fig. 5a). The 3-finger proteins of each half site of pAart 
fused to VP64 were not capable of activating luciferase expression which is consistent with 
previous results [Beerli et al., (2000) Proc Natl AcadSci USA 97(4), 1495-1500; Beerli et 
al., (2000)7. Biol. Chem. 275(42). 32617-32627]. 
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To investigate the ability of the 6-finger proteins pE2X, pE3 Y and pE3Z to 
transcriptionally regulate the endogenous erbB'2 and erbB-S genes, respectively, the 
coding sequences were subcloned into the retroviral vector pMX-IRES-GFP and fused to 
the VP64 activation or the KRAB repression domain of Kox-1 [Margolin et al., (1994) 
Proc. Natl Acad. Set USA 91, 4509-4513; Beerh et al., {199S) Proc Natl Acad Set USA 
95(25), 14628-14633]. Retrovirus was used to infect the human carcinoma cell line A431. 
Three ddys after mfection cells were subjected to flow cytometry to analyze expression 
levels of ErbB-2 and ErbB-3 (Fig. 6). The infection efficiency was determined by 
measurement of GFP expression. All cell pools, with the exception of pE2X-VP64, were 
infected to more tiian 80%. To determine the expression levels of ErbB-2 and EibB-3, 
cells were stained with specific antibodies, or a control antibody specific for ErbB-1. The 
fusion protein pE2X-VP64 was capable to up-regulate EibB-2 expression but only in 50% 
of the cells which is likely to be due to the low infection efficiency. pE3 Y showed specific 
up- and down-regulation when fiised to VP64 or KRAB, respectively, which was as 
effici^t as the previously reported pE3. The pE3Z fiision proteins did not alt^ gene 
expression of erbB-S, eventhough pE3Z had the highest affinity of the 3 generated proteins. 
The zinc finger domains described herein specifically recognizing 5*-ANN-3* DNA 
sequoices greatly contribute to the number of 6-finger proteins that can now be constructed 
and DNA sequences that can be targeted by zinc fing^-based transaiption factors. 

Example 1: Construction of zinc finger library and selection via phage display 

Construction of the zinc finger library was based on the earlier described C7 protein 
([Wu et al., (1995) PNAS 92. 344-348]; Fig 1). Finger 3 recognizing the 5'-GCG-3' subsite 
was rq)laced by a domain binding to a 5*-GAT-3' subsite [Segal et al., (1999) Proc Natl 
Acad SciUSA 96(6), 2758-2763] via a overls^ PCR strategy using a primer coding for 
finger 3 (5'-GAGGAAGTTTGCX:ACCAGTGGCAACCTG 
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GTGAGGCATACCAAAATC-3') (SEQ ID NO: 104) and a pMal-specific primer (5'- 
GTAAAACGACGGCCAG TGCCAAGC-3') (SEQ ID NO: 105). Randomization the zinc 
finger libraiy by PGR overlap extension was essentially as described [Wu et al.» (1995) 
PNAS92, 344-348; Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. The 
libraiy was ligated into the phagemid vector pComb3H [Rader et al.» (1997) Curr, Opin. 
BiotechnoL 8(4), 503-508]. Growth and precipitation of phage were performed as 
previously described [Baibas et al., (1991) Methods: Companion Methods Enzymol 2(2), 
1 19-124; Baibas et al., (1991) Proc. Natl Acad. Sci. USA 88, 7978-7982; Segal et al., 
(1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. Binding reactions were performed in 
a volume of 500(11 zinc buffer A (ZBA: 10 mM Tris, pH 7.5/90 mM KCl/lm M MgCl2/90 
(iM ZnCl2)/0.2% BSA/5 mM DTT/1% Blotto (Biorad)/20 fig double-stranded, sheared 
horing sperm DNA containing 100 ^1 precipitated phage (10^^ colony-forming units). 
Phage were allowed to bind to non-biotinylated competitor oligonucleotides for 1 hr at 4^0 
before the biotinylated target oligonucleotide was added. Binding continued overnight at 
4^C. After incubation with 50 ^1 streptavidin coated magnetic beads (Dynal; blocked with 
5% Blotto in ZBA) for 1 hr, beads were washed ten times with 500 pi ZBA/2% Twe^ 
20/5 mM DTT, and once with buffer containing no Tween. Elution of bound phage was 
performed by incubation in 25 id trypsin (10 ^ig/ml) in TBS (Tris-buffered saline) for 30 
min at room temperature. Hairpin competitor oligonucleotides had the sequence 5'- 
GGCCGCN'N'N'ATC GAGTTTTCTCGATNNNGCG(K:C-3' (SEQ ID NO: 106) (target 
oligonucleotides were biotinylated), where NNN rq>resents the finger-2 subsite 
oligonucleotides, N'N'N' its complementary bases. Target oligonucleotides were usually 
added at 72 nM in the first three rounds of selection, then decreased to 36 nM and 1 8 nM m 
the sixth and last round. As competitor a 5'-TGG-3' finger-2 subsite oligonucleotide was 
used to compete with the parental clone. An equimolar mixture of 15 finger-2 5'-ANN-3' 
subsites, excq>t for the target site, respectively, and competitor mixtures of each finger-2 
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subsites of the type 5'-CNN-3\ 5'-GNN-3\ and 5'-TNN-3' were added in increasing 
amounts with each successive round of selection. Usually no specific 5'-ANN-3* 
competitor mix was added in the first round. 

Multitarget Specificity Assay and Gel mobility shift analysis - The zinc finger* 
coding sequence was subcloned fiom pComb3H into a modified bacterial expression vector 
pMal-c2 (New England Biolabs). After transformation into XL 1 -Blue (Stratagene) the zinc 
finger-maltose-binding protein (MB?) fiisions were expressed after addition of 1 nM 
isopropyl P-D-thiogalactoside (DPTG). Freeze/thaw extracts of these bacterial cultures 
were applied in 1:2 dilutions to 96-well plates coated with streptavidin (Pierce), and were 
tested for DNA-binding specificity against each of the sixteen S'-GAT ANN GCG-3' target 
sites, respectively. EUS A (enzyme-linked inmiunosoibant assay) was perfoimed 
essentially as described [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763; 
Dreier et al., (2000) / MoL Biol 303, 489-502]. After incubation with a mouse anti-MBP 
(maltose-binding protein) antibody (Sigma, 1 : 1 000), a goat anti-mouse antibody coupled 
with alkaline phosphatase (Sigma, 1 :1000) was ^lied. Detection followed by addition of 
alkaline phosphatase substrate (Sigma), and the OD405 was determined with 
SOFTMAX2.35 (Molecular Devices). 

Gelshift analysis was performed with purified protein (Protein Fusion and 
Purification System, New England Biolabs) essentially as described. 

Example 2 ; Site-directed mutagenesis of finger 2 

Finger-2 mutants wm constructed by PGR as described [Segal et al., (1999) Proc 
Natl Acad Sci USA 96(6), 2758-2763; Dreier et al., (2000) J. MoL Biol 303, 489-502]. 
As PGR tenq)late the h1)rary clone containing 5'-TGG-3' fing^ 2 and 5'-GAT-3' finger 3 
was used PGR products containing a mutagenized fingo* 2 and 5*-GAT-3* finger 3 were 
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subcloned via Nsil and Spel restriction sites in frame with finger 1 of C7 into a modified 
pMal-c2 vector (New England Biolabs). 

Construction of polydactyl zinc finger proteins - Three-finger proteins were 
constructed by finger-2 stitchery using the SPIC framework as described [Beerli et al.» 
(1998) Proc Natl Acad Sci USA 95(25), 14628-14633], The piotems generated in this 
woric contained helices recognizing 5'-GNN-3* DNA sequences [Segal et al., (1999) Proc 
Natl Acad Sci USA 96(6), 2758-2763], as well as 5'-ANN-3' and 5'-TAG-3' helices 
described here. Six finger proteins were assembled via compatible Xmal and BsrFI 
restriction sites. Analysis of DNA-binding properties were performed from IPTG-induced 
fi^e/thaw bacterial extracts* For the analysis of capability of these proteins to regulate 
gene expression they were fiised to the activation domain VP64 or repression domain 
KRAB of Kox-1 as described earlio^ ([Beerli et al., (1996) Proc Natl Acad Sci USA 
95(25), 14628-14633; Beerli et al., (2000) Proc Natl Acad Sci USA 97(4), 1495-1500; 
Beerli et al., (2000)/ BioL Chem. 275(42), 32617-32627]; VP64: tetiameric repeat of 
herpes simplex virus' VP 16 minimal activation domain) and subcloned into pcDNA3 or 
the retroviral pMX-IRES-GFP vector ([Liu et al., (1997) Proc. Natl. Acad. Sci. USA 94, 
10669-10674]; IRES, internal ribosome-»try site; GFP, green fluorescent protem). 

Example 3 ; General Methods 

Transfection and luciferase assess 

HeLa cells were used at a confluency of 40-60%. Cells were transfected witfi 160 
ng reporter plasmid Q)GL3-promoter constructs) and 40 ng of effector plasmid (zinc 
finger-effector domain fusions in pcDNA3) in 24 well plates. CeU extracts were prq)ared 
48 hrs after transfection and measured with luciferase assay reagent (Promega) in a 
MicroLumat LB96P lumincmieter (EG & Berthold, Gaitfamburg, MD). 
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Retroviral gene targeting and Flow cytometric analysis 

These assays were performed as described [Beerli et al., (2000) Proc Natl Acad Sci 
USA 97(4), 1495-1500; Beerli et al., (2000)/ BioL Chem, 275(42), 32617-32627]. As 
primary antibody an EibB-1-specific mAb EGFR (Santa Cruz), ErbB-2-specific mAb 
FSP77 (gift from Nancy E. Hynes; Harwerth et al., 1992) and an ErbB-3-specific mAb 
SGPl (Oncogene Research Products) were used. Fluorescently labeled donkey F(ab')2 
anti-moxise IgG was used as secondary antibody (Jackson Immuno-Research). 
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Computer modeling 

Computer models were generated using Insightll (Molecular Simulations, Inc.). 
Models were based on the coordinates of the co-crystal structures of Zif268-DNA (PDB 
accession 1 AAY) and QGSR-GCAC (SEQ ID NO: 107) (1 AlH). The structures were not 
energy minimized and are presented only to suggest possible interactions. Hydrogen bonds 
were considered plausible when the distance between the heavy atoms was 3(+A 0.3) A and 
the angle fonned by the heavy atoms and hydrogen was 1200 or greater. Plausible van der 
Waals interactions required a distance between methyl group carbon atoms of 4(+/. 0.3) A. 
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WHAT IS CLAIMED IS: 

1 . A polypeptide comprising fiom 2 to 12 zinc finger-nucleotide binding peptides 
at least one of which peptides contains a nucleotide binding region having the sequence of 
any of SEQ ID NO: 7-70 and 107-1 12. 

2. The polypeptide of claim 1 containing firom 2 to 6 zinc finger-nucleotide 
binding peptides. 

3. The polypeptide of claim 1 wherein each of the peptides binds to a different 
target nucleotide sequence. 

4. The polypeptide of claim 2 that binds to a nucleotide that contains the 
sequence 5 -(ANN)n-3', wherein each N is A, C, G, or T and where n is 2 to 6. 

5. The polypeptide of claim 1 fiirther operatively linked to one or more 
transcription regulating factors. 

6. The polypeptide of claim 1 wherein each of the peptides contains a 
nucleotide bindmg region having the sequence of any of SEQ ID NO: 46-70. 

7. The polypeptide of claim 1 wherein each of the pq)tides contains a 
nucleotide binding region having the sequence of any of SEQ ID NO: 7-4S. 

8. The polypeptide of claim 1 wherein each of the pqptides contains a 
nucleotide binding region having the sequence of any of SEQ ID NO: 10, 1 1, 17, 19, 21 , 23- 
30, 32, 34-36, 42, 43 or 45. 
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9. An isolated and purified polynucleotide that encodes the polypeptide of 
claim 1. 

10. An expression vector containing the polynucleotide of claim 6. 

11. A process of regulating expression of a nucleotide sequence that contains 
the sequence (S*-ANN) n-3*, where n is an integer fiom 2 to 12, the process comprising 
exposing the nucleotide sequence to an effective amount of the polypeptide of claim 1 . 

1 2. The process of claim 1 0 wherein the sequence S'-(ANN)n-3* is located in the 
transcribed region of the nucleotide sequence. 

13. The process of claim 10 wherein the sequence S'-(ANN)n-3* is located in a 
promotor region of the nucleotide sequence. 

14. The process of claim 1 0 wherein the sequence S'-<ANN)n-3' is located within 
an expressed sequ^ce tag. 

15. The process of claim 10 wherem the polypeptide is operatively linked to one 
or more transcription regulating factors. 
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